Because distributions are so important to biostatistics, it’s a good practice to prepare a histogram for
every numerical variable you plan to analyze. That way, you can see whether it’s noticeably skewed
and, if so, whether a logarithmic transformation makes the distribution normal enough so you can use
statistics intended for normal distributions on your data.
If you can’t find any transformation that makes your data look even approximately normal, then you
have to analyze your data using nonparametric methods, which don’t assume that your data are
normally distributed.
Summarizing grouped data with bars, boxes, and whiskers
Sometimes you want to show how a numerical variable differs from one group of participants to
another. For example, blood levels of a certain cardiovascular enzyme vary among the cardiology
patients at four different clinics: Clinic A, B, C, and D. Two types of graphs are commonly used for
this purpose: bar charts and box-and-whiskers plots.
Bar charts
One simple way to display and compare the means of several groups of data is with a bar chart, like
the one shown in Figure 9-7a. Here, the bar height for each group of patients equals the mean (or
median, or geometric mean) value of the enzyme level for patients at the clinic represented by the bar.
And the bar chart becomes even more informative if you indicate the spread of values for each clinical
sample by placing lines representing one SD above and below the tops of the bars, as shown in Figure
9-7b. These lines are always referred to as error bars, which is an unfortunate choice of words that
can cause confusion when error bars are added to a bar chart. In this case, error refers to statistical
error (described in Chapter 6).
© John Wiley & Sons, Inc.
FIGURE 9-7: Bar charts showing mean values (a) and standard deviations (b).
But even with error bars, a bar chart still doesn’t provide a picture of the distribution of enzyme
levels within each group. Are the values skewed? Are there outliers? Imagine that you made a
histogram for each subgroup of patients — Clinic A, Clinic B, Clinic C, and Clinic D. But if you think
about it, four histograms would take up a lot of space. There is a solution for this! Keep reading to find
out what it is.
Box-and-whiskers charts
The box-and-whiskers plot (or B&W, or just box plot) plot uses very little space to display a lot of
information about the distribution of numbers in one or more groups of participants. A box plot of the